Techniques for Estimating the Computation and Communication Costs of Distributed Data Mining

نویسندگان

  • Shonali Krishnaswamy
  • Arkady B. Zaslavsky
  • Seng Wai Loke
چکیده

Distributed Data Mining (DDM) is the process of mining distributed and heterogeneous datasets. DDM is widely seen as a means of addressing the scalability issue of mining large data sets. Consequently, there is an emerging focus on optimisation of the DDM process. In this paper we present cost formulae for estimating the communication and computation time for different distributed data mining scenarios.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presented a method for estimating the cost of software using PCA to reduce the size and with the help of data mining

  These days, data mining one of the most significant issues. One field data mining is a mixture of computer science and statistics which is considerably limited due to increase in digital data and growth of computational power of computer. One of the domains of data mining is the software cost estimation category. In this article, classifying techniques of learning algorithm of machine ...

متن کامل

Separating indexes from data: a distributed scheme for secure database outsourcing

Database outsourcing is an idea to eliminate the burden of database management from organizations. Since data is a critical asset of organizations, preserving its privacy from outside adversary and untrusted server should be warranted. In this paper, we present a distributed scheme based on storing shares of data on different servers and separating indexes from data on a distinct server. Shamir...

متن کامل

Design and Analysis of a Dynamic Load Balancing Strategy for Large-Scale Distributed Association Rule Mining

Association rule mining is one of the most important data mining techniques. Algorithms of this technique search a large space, considering numerous different alternatives and scanning the data repeatedly. Parallelism seems to be the natural solution in order to be able to work with industrial-sized databases. Large-scale computing systems, such as Grid computing environments, are recently rega...

متن کامل

Bu er - Safe Communication Optimization based on Data FlowAnalysis and Performance

This paper presents a novel approach to reduce communication costs of programs for distributed memory machines. Our techniques are based on uni-directional bit-vector data ow analysis that enable vectorizing and coalescing communication, overlapping communication with computation, eliminating redundant messages and amount of data being transferred both within and across loop nests. Our data ow ...

متن کامل

Bu er - Safe Communication Optimization based on Data

This paper presents a novel approach to reduce communication costs of programs for distributed memory machines. Our techniques are based on uni-directional bit-vector data ow analysis that enable vectorizing and coalescing communication, overlapping communication with computation, eliminating redundant messages and amount of data being transferred both within and across loop nests. Our data ow ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002